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@ Congestion control for connectionless traffic In data networks via alternate routing. 



@ A congestion control scheme for connectionless networks relieves congestton by routing a portbn of 
traffic on a congested primary path onto a predefined alternate path constructed such that loop- 
freedom is guaranteed. Explicit care is taken to avokj spreading congestnn onto alternate paths. The 
control actk)ns are taken in a completely distributed manner, based on local measurements only and 
therefore no signaling messages need to be exchanged between nodes. 

If desired, lower loss priority may t>e assigned to altemate routed traffic. Gongestk>n is monitored 
locally and thresholds defined to declare the onset and abatement of congestion. The present inventk)n 
affords at least an order of magnitude improvenr>ent in end-to^end cell blocking under sustained 
focussed overioad. 



O 

in 

CO 



Q. 
lU 



Jouve, 18. rue Saint-Denis, 75001 PARIS 



EP 0 465 090 A1 



Technical Field 

The present invention relates generally to data communications, and. in particular, to a congestion control 
scheme for connectionless traffic in data networks. 

5 

Background of the Invention 

Connectionless data networks (such as the ARPANET network) penmit the interchange of packetized data 
between interconnected nodes without the need for fixed or centralized network routing administration. Each 

10 node examines packet header Infonmation and makes routing decisions based only upon locally available infor- 
mation, without explicit knowledge of where the packet originated or of the entire route to the destination node. 
In this environment, traditional congestion control strategies such as window flow control and per virtual circuit 
buffering and pacing cannot be used because of the absence of end-to-end acknowledgements. 

One congestion control approach that has been implemented in some connectionless networks is the use 

15 of choke messages. In this method, a congested node sends feedback messages to other nodes, asking them 
not send traffic to it until further notification. There are several drawbacks to this approach: first, by the time 
the choke message reaches the offending node, a substantial amount of traffic would have been transmitted. 
For example, in a network consisting of 1 50 Mbps tmnks. a choke packet sent on 1000 mile long link takes 10 
msecs of propagation time. In this time, 1 .5 M bits are already in transit and will contribute to existing congestk)n. 

20 Secondly, in connectionless networks, there is no knowledge of the path traversed by a packet before arriving 
at a given node; therefore, choke messages may have to be sent to all the neighbors including those that do 
not contribute to congestion. This will lead to under-utilization of the network. Another difficulty with this method 
is the action taken by a node upon receiving a choke packet. If it drops all packets headed towards the con- 
gested node, then subsequent retransmissions will contribute to increased congestion. Since there is no con- 

25 nection-oriented layer that the network interacts with, it is difficult to stop b^c at the sources responsible for 
causing congestion. Therefore choke messages do not appear to be an effective means of congestion control 
in connectionless networks. 

Certain other approaches that have been tried in connectionless networks such as ARPANET involve 
changing network routing in response to changes in traffic conditions, by dynamically recomputing paths t>e- 

30 tween nodes in a completely distributed fashion. This can be illustrated by considering the RIP scheme which 
has been tried in ARPANET. In RIP. each node stores the entire network topology, and periodically transmits 
routing update messages to its neighboring nodes. The routing update messages provide reachabOity infor- 
nrration which tells each neighboring node how the originating node can reach the other nodes in the network, 
together with some measure of the minimum distance to the vark>us nodes. The measure of distance used is 

35 different in different verstons of RIP. The original RIP protocol used hop-counts to measure distance, while sub- 
sequent nrKKiifications use delay estimates to reach a destination as a measure of distance. 

The problem with the RIP scheme is that it has several serious drawbacks: first, a large anrKXjnt of infor- 
n^tion must be exchanged between nodes in order to ensure consistent routing changes, and this itself may 
consume significant network resources. Second, because paths are dynamically recomputed, there is serious 

40 potential for problems such as packet looping, packet missequenctng and route oscillatk}ns. Also, because of 
propagatk>n delay, the information exchanged between nodes may be outdated, and hence may not be reliable 
for changing routing. This problem is especially serious in high speed networks (> 45 mbps). 

A second dynamic routing protocol called IGRP uses a composite metric which includes propagation delay, 
path bandwidth, path utilization and path reliability, as a measure of distance, if the minimum distance path is 

45 different from the one currently in use, then all the traffic is switched to the newly computed shortest path. If a 
set of paths are "equivalent", load balancing is used. 

When dynamic changes in routing are occasioned by the IGRP protocol, traffic shifts finom one path to 
another, so that congestion may be caused on the new path. Subsequent distance and shortest path compu- 
tation may then switch the traffic back onto the original path. In this manner each path would experience oscil- 

50 lations in offered traffic and the end result may well be that neither path is fully utilized. This problem may only 
be partly alleviated by averaging the distance measurements over an interval of time before transmitting to the 
other nodes. 

A third, very recent proposed enhancement to the ARPANET routing protocol decribed in "An Extended 
Least-Hop Distributed Routing Algorithm," written by D. J. Nelson, K. Sayood. and H. Chang, published in IEEE 
55 Transactk)ns on Communications, Vol. 38, No. 4. April 1990, pages 520-528, augments the set of avaDable 
shortest path routes to cany packets to a given destination by including routes that are one hop longer than 
the shortest path routes. Each node maintains an estimate of the total delay involved in reaching every desti- 
nation. The route which has the minimum delay to a given destinatk>n is then picked from the set of routes avail- 
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able to carry traffic to that destination. Although this approach shows considerable improvenr^ent over the exist- 
ing ARPANET routing, It also has several disadvantages. First the optimal, minimum delay path has to be cho- 
sen for each packet, leading to increased processing in the switch. Second, at any given time, only one path 
is active and hence there is no notion of load balancing. All traffic is routed on the same path until a path with 
5 a better delay estimate is available. Third, nodes need to exchange delay information and hence some fonm 
of signaling between nodes is necessary. Lastly, only paths that are one hop longer are considered in addition 
to the shortest paths. Thus, some longer idle paths will not be chosen, even though they could have successfully 
carried the traffic. 

Yet another possibility for dealing with congestion is to try to reduce the impact of its consequences. For 
10 example, one way of avoiding packet losses due to buffer overflow is to increase the link buffer sizes. There 
is a serious drawback to this approach: if the buffer size is made very large, cells will experience high queueing 
delays and end-to-end performance may be affected to the extent that the end systems may tinne-out and 
retransmit. On the other hand, if the buffer size is designed to keep the maximum queueing delay within accept- 
able bounds, then since the buffer occupancy tends to increase exponentially as the link utilization approaches 
15 unity, buffers will eventually overflow in the face of sustained focussed overioad on the link and the resulting 
celt losses will cause ttie end systems to retransmit Thus, increasing the buffer size is not a viable congestion 
control strategy. 

Summary of the Invention 

20 

In accordance with the present invention, congestion caused by transient focussed overioads in connec- 
tionless networks is relieved by routing a portron of traffic intended for a congested primary path onto a pre- 
defined alternate path. An explicit algorithm is used for constructing alternate paths in such a way that loop- 
freedom is guaranteed. Briefly, this is done by organizing the nodes that neight>or a given node into layers such 

25 tiiat nodes that are the same distance (in hops) from a given destination are in the same layer. A weight is then 
assigned to each possible path between (a) the given node and each neighbor in the same layer, and (b) each 
neight>or and a node in a closer layer (in hops) to which the neighboring node is connected. The pairwise sum 
of the weights for each combination of paths is then computed and the alternate path is determined as the path 
having the minimum sum. Furthenmore, care is taken to avoid spreading congestion onto alternate paths by 

30 marking alternately routed packets, so that they are more readily droped in the event that congestbn is again 
encountered at nodes further along the alternate path. By appropriately choosing threshold values for initiating 
a transition to an altemate route and for revering to a prinnary route, route oscillations can t>e avoided. The rout- 
ing determinations and network control actions are taken in a completely distributed manner based on local 
measurements only, and therefore no signaling messages or routing data need to be exchanged t>etween net- 

35 work nodes. The invention is nrvost useful in conjunction with data networks where traffic tends to be very bursty, 
because when some paths are busy, it is quite likely that others are relatively idle. Accordingly, when there is 
non-coincidence of overioads on various parts of the network, our invention provides the greatest benefits. 

Brief Description of the Drawing 

40 

The present invention will be more fully appreciated by reference to the following detailed description, when 
read in light of the accompanying drawing in which: 

Fig. 1 is a diagram illustrating the interconnection of an exemplary network having seven nodes; 

Figs. illustrate the "exclusionary trees" developed by one of the nodes in the networtc of Fig. 1 ; 
45 Figs. 5-7 illustrate the "exclusionary trees" received by one of the nodes in the network of Fig. 1; 

Figs. 8-10 illustrate the "exciustonary trees" of Figs. 5-7, respectively, which have been redrawn so that 

each successive node descending from a root node is placed at the same vertical level; 

Fig. 1 1 illustrates the result when the "exclusionary trees" of Figs. 8- 10 are merged; 

Fig. 12 illustrates the "layering" of nodes in a network with respect to a destinatk)n node; 
50 Fig. 13 illustrates primary and altemate paths between some of the nodes in Fig. 12; 

Figs. 14 and 15 illustrate undesirable single link looping between a pair of nodes; 

Fig. 16 illustrates one example of our altemate routing technique as applied to a four node network in which 

three nodes are located in a first layer and the fourth node is located in a second layer, 

Fig. 17 is a redrawn version of Fig. 16 in which the fourth node is replaced by three "equrvalenr nodes; 
55 Fig. 18 illustrates alternate and primary routing paths between a first node i and a second node j; 

Fig. 19 illustrates altemate routing between the three nodes in the first layer of Fig. 17 that would lead to 

undesirable looping; 

Fig. 20 illustrates multiple nodes in a network, and arangement of such nodes in k layers; 
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Fig. 21 is a flow chart illustrating the overall process for generating alternate routes in accordance with the 
invention; 

Figs. 22 and 23 are a flow chart (in two parts) that Olustrates the process for performing layering step 21 03 
of Fig. 21; 

5 Figs. 24 and 25 are a flow chart (in two parts) that illustrates in nrtore detail the process of generating alter- 

nate routes in accordance with the invention; and 

Fig. 26 is a flow chart illustrating the process for giving higher priority to packets that are routed on uncon- 
gested routes and for allowing marked packets that travel on alternate routes because of congestion to t>e 
discarded in the event that heavy traffic Is encountered; 
10 Fig. 27 is a typical functional architecture for the nodes in the networks of Figs. 1-20; 

Fig. 28 is a functional diagram of the arrangement of nodal processor 2730 of Fig 27; 
Fig. 29 is a three node network model used in simulations of the present invention; 
Fig. 30 is a queueing model corresponding to the network of Fig. 29; 

Fig. 31 is a graph Illustrating blocking probability with and without alternate routing as a function of offered 
15 load; and 

Fig. 32 is a rescaled version of the graph of Fig. 31. 

Detailed Description 

20 In order to fully understand the alternate routing technique of the present invention, it is instmctive to first 

consWer one technique that can be employed to detemiine the primary path taken by messages traveling be- 
tween nodes in a connectionless network under normal (i.e., non-congested) conditions. This technique is dis- 
tributed adaptive minimum spanning tree routing, sometimes also known as "exclusionary tree" routing, details 
of which can be found in Patent No. 4.466.060 issued to G. G. Riddle on August 14. 1984. Other routing techni- 

25 ques are described In D. E. Conwr's book, "Intemetworking With TCP/IP: Principles. Protocols and Architec- 
ture," Chapter 15: Interior Gateway Protocols, Prentice Hall. 1988. The overall objective of the exclusionary 
tree technique Is to maintain a table of correct shortest paths to all destinations at each node of the networic 
For this purpose, routing tables are initially constructed and updated whenever there are topological changes 
in the network, as for example, when a node orlink is added or deleted. The update procedures are implemented 

30 at each node independentiy. in a distributed fiashlon. The resulting routing tables are designed to yield minimum 
hop patiis to all destinations in such a way that there is no looping. 

Two principal steps are at the heart of the exclusionary tree routing technique: (1) Each node sends an 
exclusk>nary tree to each of its neighbors, and (2) a prescribed procedure is employed at each of tiie nodes to 
nDerge the received exdustonary trees into a routing table. These two steps are repeated at each node until 

35 the routing tables converge. 

The exclusionary tree routing technique can be best described through the following example. Consider 
the network consisting of seven nodes 1-7 shown in Fig. 1. Each node sends an exclusionary tree to each of 
its neighbors. An exclusionary tree is the shortest path tree obtained after deleting all links connected to the 
receiving node. Figs. 2-4 illustrate the exclusionary trees sent by node 1 to its neighbors, namely nodes 6, 5 

40 and 2, respectively. Figs. 5-7 show the exclusionary trees received by node 1 from its neighbors, nodes 6, 4 
and 2. respectively. The received exclusionary trees are each flrst redrawn with their nodes descending from 
the root, each successive node being placed at a vertical level corresponding to its distance in hops from node 
1 . as shown in Figs. 8-1 0. The nnerged tree for node 1 shown in Fig. 1 1 is obtained by merging the exclusionary 
trees of Figs. 8-10 received by node 1 from its neighbors, according to the following procedure: The received 

45 exclusionary trees* nodes at a distance of one hop are visited from left to right (in the example, node 6. then 
node 5. then node 2) and placed in the merged tree of Fig. 11). Next, nodes at a distance of two hops are 
visited in the same order (left to right) and are attached to their parent nodes, if they are not already there at 
a lesser distance. This procedure is repeated successively to create a merged tree. If the node of interest is 
present in nrK)re than one received exclusionary tree at the same distance in hops, then each root node is rep- 

50 resented in the merged tree, resulting in multiple entries for nodes that have multiple equal length routes. This 
situation did not occur in Fig. 1 1 . Whenever multiple equal length routes exist, traffic is distributed over all such 
routes so as to achieve load balancing. It is to be noted here that other techniques can also be used to determine 
the primary networic routing used in the absence of congestbn. 

In accordance with our invention, during times of congestion, some fraction of the packets nomially routed 

55 on prin^ary routes are instead routed on secondary or altemate paths that are lightiy loaded. The manner in 
which alternate routes are selected will be better understood by flrst considering an arbitrary networic which is 
depicted in the form of a layered architecture in Fig. 12. The layering in Fig. 12 is with respect to destination 
node D. such that nodes 1231-1233 in layer k (k is an integer) have at least one k-hop shortest path to D. This 
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means that every node in layer k must have at least one link connecting it to a node in layer (k-l). ff a node in 
layer k is connected to more than one node (1221-1223) in layer (k-l), then it has more than one k-hop shortest 
path to D. These are precisely the multiple shortest paths constructed by the exclusionary tree routing algorithm 
described above. By exploiting the connectivity between nodes 1231 - 1233 in layer k. our technique is used 
to generate loop-free alternate paths to D which are at least of length (k+l) hops. There are two ways of doing 
this. Both assume that only shortest path primary routes are pemiitted. 

In the first method, if all nodes are numbered, and if we let nodes i and j (i and j are integers) belong to 
layer k and let nodes i and j be connected by a link, then node i can alternate route packets intended for des- 
tination D via node j if i<j. This method is loop-free, because the primary routes are hierarchical shortest path 
routes, while the secondary (altemate) routes essentially create a hierarchy within layer k. For example, Fig. 
13 shows 4 nodes numbered 1301-1304 in layer k connected to 4 nodes numbered 131 1 - 1314 in layer (k-l). 

In Fig. 1 3, routing choices marked 1 are primary routes and routing choices marked 2 are secondary routes. 
It Is dear that no inter-layer looping is possible since there is no downward routing - a node in layer (k-l) cannot 
route to a node in layer k. No intra-layer looping is possible, because node 1304 cannot altemate route to any 
of the other (lower numbered) nodes. This first method is simple to implement, but has the disadvantage that 
the highest numbered node (e.g., node 1304) in every layer Is denied an altemate route. This disadvantage is 
overcome, albeit at the cost of added complexity, in the second method. 

In the second method, we require two additional pieces of infomiation These are: 

(i) the ability to avoid single link loops of the form shown in Fig. 14, wherein node i must recognize that a 
packet was routed to it by node j, and must prevent the packet from going back to node j. This is necessary 
to avoid looping between nodes i and j when i and j route to each other on a second choice basis. As shown 
in Fig. 15, i.e., when the primary paths out of both nodes i and j are unavailable (due to congestion), packets 
must not be allowed to loop between i and j, but should be dropped. 

(ii) Every node i must be assigned weights w(i,j) with respect to all other nodes j to which it is connected. 
Further, the weights must be chosen so that: 

(a) they are symmetric, i.e., w(i,j) = w(j,l) for all i and j, and 

(b) w(i j) + w(j,k) is unique, in the sense that w(i,j) + w(i,k) = w(i,l) + w(l,m) => j=l and k=m. 

This condition means that for any two nodes that have at least one two-hop path connecting them, there 
is a unique minimum weight two-hop path connecting the two nodes. One way of satisfying condition (b) is to 
choose weights so that the pairwise sum is unique, i.e., such that no two sun^ are the same. As will be described 
t>elow, the weight infonmation can be transmitted to each node together with the exclusionary tree routing infor- 
mation. The reason why the assignment of these weights is necessary will be also explained below. 

Under the above conditions, the fact that our altemate routing technique is loop-free can be demonstrated 
as follows: 

Let nodes 1.2,...m be in layer k and nodes 1', 2', 3'. m ' in layer (k-l). The nodes in layer (k-l) may be 

repeated and are not necessarily unique (for notational convenience). Let us suppose that node i in layer k is 
connected to node i' in layer (k-l). There is no loss of generality in doing this because, even rf layer (k-l) has a 
single node, it can be repeated m times. For example. Fig. 16, which shows links between nodes 1-3 of layer 
k and node 4 of layer k-l may be redrawn as Fig. 17, in witch node 1 is linked to node V, node 2 is linked to 
node 2' and node 3 is linked to node 3', as long as nodes 1' ,2' and 3'. are each "equal" to node 4. 

The route i->i' is always the primary route from node i, for all packets to a particular destination D (with 
respect to which the network has been layered). With this notation in mind, our loop-free altemate routing tech- 
nique may t>e expressed in the following manner 

Routing Rule: Let nodes i,j and f belong to layer k and nodes j' and f ' belong to layer k-1. Then, node i altemate 
routes to node j if and only if 



Equatbn 1 is illustrated diagrammatically in Fig. 18. in which nodes t. j and j' are shown. In that figure, the link 
between nodes i and j is marked 2, indicating that this is the secondary path from node i to node j in layer k; 
the link between nodes j and j' is marked 1 , indicating that this is the primary path from node j in layer k to node 
f in layer k-1. In accordance with our technique. w(i,j)i'w(j,j') is then the unique minimum weight 2-hop path to 
get from node i to any node in layer (k-l). 

The loop-free property of our technique can be demonstrated by first considering the case when m=3, i.e., 
thsee nodes in layer k. Assume that nodes in layer k are fully connected. The corresponding network is shown 




5 



EP 0 465 090 A1 

in Fig. 17. The connectivity between nodes in layer (k-1) is not important and. hence, is not shown. The only 
possibility of looping occurs if each node in layer k alternate routes to a node in layer k that has not previously 
served as an alternate. For example, the situation illustrated in Fig. 1 9 is a loop, because node 1 alternate routes 
(marked "2") to node 2. node 2 altemate routes to node 3, and node 3 alternate routes back to node 1. Using 

5 our technique, such a loop cannot occur, because if node 1 altemate routes to node 2, and node 2 altemate 
routes to node 3, then node 3 must necessarily altemate route to node 2, so that a loop cannot occur. Now, 
the fact that node 1 altemate routes to node 2 implies that 

w(1 ,2) + w(2.20<w(1 ,3) + w(3,3') (2) 
Next the fact that node 2 altemate routes to node 3 implies that 

10 w(2.3) + w(3, 3')<w(2,1) + w(1.1') (3) 

Adding inequalities (2) and (3) and noting that w(i,j) = w(j,i), we get 

w(3,2) + w(2,2')<w(3.1) + w(1.1') (4) 
which implies that node 3 altemate routes to node 2. 

Thus, a three link loop cannot occur. However, a single-link loop may occur and hence the nodes must 

15 have the ability to recognize and prevent a single-link loop. Such a capability can be implemented simply in 
each node by preventing messages or packets from departing from the node on the same link that they anrived 
on. This is discussed In nnore detail t>elow. 

It should be noted that if the weights w(i,j) are chosen to be the actual distance d(i,j) between nodes i and 
j. then our invention leads to shortest distance altemate routing, which would be very important in a geographi- 

20 cally dispersed networ1(. However, while the symmetry property, viz., d(l,j) = d(j.i) is satisfied, the uniqueness 
property is not guaranteed. To ensure uniqueness, the intemodal distances may have to be infinitesimally per- 
turbed so that if 

d(i.j) + da.j') = d(i.k)+d(k.k'). (5) 
then d(l,j) is changed to d(i,j) -i- e, where e is an arbitrary small number. However, it may be noted that since 
25 d(i,j) are real numbers, practically speaking the uniqueness conditk>n Is generally satisfied. The distance infor- 
nDation can easily be provkled to each node when the distributed shortest path route is determined, by providing 
V,H coordinates. If V|H|andVjHj are coordinates for two connected nodes i and j, the distance d(i,j) is given by 

rf.i,., >/(VrV|F + (HrHy 



30 Other fully distributed techniques can t>e used to find a mapping between the nodes, i, j and j' used in alter- 

nate routing and the weights w{i j),w(i,j') associated with the links between the nodes. For example, if each 
node, i, j, j' has a unique integer number, then w (i, j) can be arbitraly defined as (i*'+j*') and w can be I ikewise 
defined as (j^tj'^)> where q ts a suitably chosen integer. Other mappings (1. j) -> w (i. j) such that w (i,j) = w (j, 
i) and w(i,j)+w(j.j')=w(i, f )+w(f , f '):^j=f can also be found. However, weight assignments may also be centrally 

35 administered, and the appropriate weights periodically downloaded to each node when there is a topograph k:al 
change in the network, without significantly degrading the performance of the networic. 

It is dear from the prevtous discussk>n that the topological information needed to construct loop-free alter- 
nate routing in accordance with our invention is the layering of the network with respect to every destination 
node. This information can t>e readily obtained from the exclusionary tree information which is already avaflable 

40 in each node. Consider the layering with respect to destination node D shown in Fig. 20. Suppose that the node 
at which we are constructing the routing table is node S in layer k. Let node S t>e connected to nodes Si. —Sk 
in layer k. Now, node S knows through its own prin^ry routing table (constructed using exclusionary tree rout- 
ing) that ft is in layer k with respect to D. It also knows from the exclusionary trees received from Si. - S|(, that 
they are also in layer k with respect to D. There may be other nodes in layer k which S does not know about 

45 But this does not matter as S is not connected to those nodes and hence could not altemate route to any of 
them. The key is that the exclusionary tree information is sufficient for a node to determine which of its neighbors 
are in the same layer as itself with respect to any given destinatron node. (This is unlike a centralized algorithm, 
in which all nodes have global knowledge of the network topology and. hence, every node knows all the other 
nodes in its layer. This Is more infomiation than needed, since a node only has to know the other nodes in the 

50 layer to which it is connected.) 

The overall process by which each node determines its secondary routing table is shown in the flow chart 
of Fig. 21 . Initially, in step 2101 . network topology information, i.e.. the identity of nodes that neighbor the current 
node is determined from the exclusionary tree routing information already available in the node. If another tech- 
nique is used to generate the primary route, it is nevertheless assumed that this topology informatk>n is at hand. 

55 Likewise, in step 2102, the weights wjj associated with the paths between the current node and its neightx>rs 
are computed from V. H coordinates if Internodal distance is used as the weighing criteria, as described above. 
Otherwise, the appropriate weights are stored in the node. 

Next, for a destlnabon D. the network of nodes is organized into layers in step 2103, using the process 
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described in more detail in Figs. 22 and 23. As stated previously, each layer contains all nodes having the same 
distance, in hops, to the destination, using the shortest available path. 

In step 2104 an alternate route to destination D is generated, using the process described in more detail 
in Figs. 24-25. This process is distributed, i.e., it is performed in each node independently. 

5 After the alternate route for a given destination has been detennined, a decision is made in step 2105 as 

to whether all destinations have been processed. If not, steps 2103 and 2104 are repeated for the other des- 
tinations. If routes for all destinations have been detemnined, the process stops in step 2106. 

The layer generation or organization step 2103 of Fig. 21 is illustrated in more detail in Figs. 22 and 23. 
Initially, in step 2201, the identity of each of the neighbors of the current node i are stored in a memory or other 

10 suitable storage device. This information would be available at each node if the exclusionary tree routing pro- 
cess is used. In step 2202 the shortest distance (k) in hops, between I and the destination node D is computed. 
This infonmation is also available as a result of the exclusionary tree routing process. It should be noted, how- 
ever, that any other distributed shortest path algorithm can be used for primary route selection and any such 
algorithm would give us the shortest distance in hops. A similar procedure is then repeated for each neighbor 

15 j of I, in step 2203. to determine the distance m in hops between j and D. When the results of steps 2202 and 
2203 are both available, a comparison between m and k is made in steps 2204 and 2214. If m and k are deter- 
mined to be equal in step 2204. then it is concluded that j and i are in the same layer k (step 2205). and this 
infonmation is stored (step 2206). If it is detenmined that m = k - 1 in step 2214, then it is concluded ttiat j is in 
layer k-1 (step 2215) and this informatk)n is stored (step 2216). If the results of steps 2204 and 2214 indicate 

20 that m does not equal k or k-1, then it is concluded that j is in a vnore distant layer k+l from D (step 2225). This 
infonmation is therefore not needed, and is discarded in step 2226. 

The layering process is further described in Fig. 23, which is a continuation of Fig. 22. After a particular 
neigh t>or j of the current node, i, has t>een examined to detenmine whether it is in the same layer k, a closer 
layer k-1, or a more distant layer k+1, a determination is made in step 2230 as to whether all neighbors j of 

25 node i have been examined. If not, the portion of the process beginning at step 2203 is repeated. After all 
neighbors of node i have been examined, a determination is made in step 2240 as to whether all destinations 
D have been examined. If not. the entire layer generation process, beginning at step 2202. is repeated for the 
next destination. When all destinations have been examined, the layering process is stopped in step 2250. 
The alternate route generatk)n process of step 2104 of Fig. 21 is described in more detail in Figs. 24 and 

30 25. Initially for each destination D, all neighbors of node i that are In the same layer k as 1 (with respect to des- 
tination D) are stored in step 2401. This step thus uses the layering infonmation previously obtained from the 
procedure described above in connectbn with Figs. 22 and 23. In a similar manner, all neighbors of node i in 
layer k-1 (with respect to destination D) are also stored, in step 2402. After information regarding the neighbor- 
ing nodes has been stored, a determination is made in step 2403 of the weight w(ij) associated with the link 

35 bettween nodes i and j within layer k. Similariy for each neigh t>or j' of node j in layer k-1 . a determination is 
nr^de In step 2404 of the weight w(i,j') associated with the link between node j in layer k and node j' in layer 
k-1. The sum of the weights w(i,j) and w(jj') is next computed and stored in step 2405. At this point, a deter- 
mination is made in step 2406 as to whether all neighbors f of rtode j have been examined. If not, steps 2404 
and 2405 are repeated for the next neighbor j'. After all neighbors j' have been examined, a determination is 

40 made in step 2407 (now referring to Fig. 25) as to whether all neighbors j of node i have been examined. If not. 
tiie computation process beginning with step 2403 is repeated. After all neighbors j of node i have been 
examined the stored combined weights are processed in step 2408 to select nodes j and j' such that w(i,j)+w(j,j') 
is a minimum. This minimum value determines the specific node j that is the alternate route for traffic from node 
i that is destined for node D. 

45 After tiie alternate route for a speciTic destination D has been computed, a determination is nnade in step 

2409 as to whether all destinations D have been processed. If not the alterate route generation process begi- 
nning at step 2401 is repeated, so that a table of alternate routes, one for end destination, can be formed. When 
all destinatk>ns D have been processed, the process Is stopped in step 2410. 

In order to avoid the spread of congestion caused by altemate routing, another feature of our invention is 

50 the marking of a bit in the header of all packets that are routed on the altemate path. At all nodes in the altemate 
path, marked packets are given lower loss priority. This means that if the buffer occupancy at these nodes is 
below a preset threshold, then the marked packet is admitted, otherwise It is discarded. If the altemate path is 
also busy, then the altemate routed traffic is dropped and the spread of congestk>n is avoided. This process 
is illustrated in Fig. 26. 

55 For each link outgoing from a node, a periodic measurement is made in step 2601 of the occupancy "x" of 

the buffer associated with that link. If x is determined to be less than a threshold value T^ in step 2602, traffic 
on that link is uncongested. so that the uncongested routing table is selected in step 2603. \1 x is also less than 
a threshold value Taccp. both marked and unmarked packets are accepted for transmissbn over that link. How- 
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ever, if x is greater than or equal to Taccp, traffic on the link is relatively heavy (but stDI uncongested). In that 
circunDStance, only unmarked packets are accepted in step 2607 for transmission over that link. 

If it is detemiined in step 2602 that buffer occupancy x is equal to or greater than T^. the alternate route 
is selected in step 2604. However in this event, only a preselected fraction of packets are actually diverted from 

5 the primary route and routed on the altemate link. A test is next performed in step 2620 to detemiine if the out- 
going link selected by altemate routing is connected to the same node as the node from which the packet was 
received. This test is performed in order to avoid single link loops, and is based on infonmation relating to inconv 
ing and outgoing links that are readily available in the node. If the test result is positive so that a single link 
loop would be created, the packet is instead dropped or discarded in step 2621. Otherwise, a detenmlnatton is 

10 made in step 2608 as to whether the altemate trunk was used to route the packet to the next node. If yes, the 
marking of that packet occurs in step 2609, so that the status of that packet as one having been altemate routed 
will be recognized in succeeding nodes. On the other hand, if altemate routing is not used, the packet is not 
marked (step 2610). 

In accordance with another aspect of our invention, we have found it advantageous to use link buffer occu- 

15 pancy as a measure of link congestbn, to detemiine when altemate routing should be applied. The activation 
and deactivation of altemate routing, as well as the decision to accept or reject an altemate routed cell, would 
then be based upon measurements of link buffer occupancy. Vinous specific buffer nnonitoring techniques can 
be used for this purpose, depending upon implementational convenience. For example, since link buffer occu- 
pancy fluctuates at great speed, It can be measured every millisecond. A running average of the 1000 n[X)st 

20 recent measurements can then be used to nrK)nitor congestion. When the average buffer occupancy exceeds 
a predetenmined congestk>n threshold, some of the traffic is altemate routed, and these packets are marked 
by setting a loss priority bit in the header. 

Fig. 27 illustrates, in simplified form, the functional architecture for a typical node 2701. As shown in that 
figure, node 2701 interconnects a series of incoming links 2710-2712 with a series of outgoing links 2720-2722. 

25 Links 2710-2712 and links 2720-2722 may in some implementation each be one or more high speed data 
trunks. Input buffers 2715-2717 receive packets applied on links 2710-2712, respectively, and apply the pack- 
ets to a nodal processor 2730 to be described below. Likewise, output buffers 2725-2727 receive packets output 
from nodal processor 2730 that are destined for links 2720-2722, respectively. The occupancy or fullness of 
output buffers 2725-2727 are monitored in a congestion monitor 2740 which is part of nodal processor 2730, 

30 to detenmine when one or more links 2720-2722 is congested. The output of congestion nrwnitor 2740 controls 
nodal processor 2730 such that a primary route to a destination is selected from table 2750 in tiie absence of 
congestion and an alternate route to a destination is selected from table 2760 in the presence of congestk)n. 
Nodal processor 2730 also includes a single link loop avoidance processor 2770, which Is activated when con- 
gestion routing is used. The purpose of this processor is to assure that a packet originating at a neighboring 

35 node is not sent back to that node, so as to avoid forming a single link loop. This may be accomplished by keep- 
ing track of the input link on which a packet is received, and dropping the packet (i.e., not transmitting it) if the 
congested route specified by congested routing table 2760 is on a link back to the same node. 

A more complete functtonal descriptton of the arrangement of nodal processor 2730 is contained in Fig. 
28. Nodal processor 2730 contains a central processing unit (CPU) 2810 and a menrK>ry 2850 having several 

40 portk)ns. The network layering infonmation that results from the process illusb^ted in Figs. 22 and 23 is stored 
for each destination node in portion 2802 of n>enK)ry 2850, while netwoik topology infonnation is stored in 
another portk)n 2801 of the same memory 2850. Weights corresponding to different node pairs are also stored 
in the same portion of menrKsry 2850. 

Whenever there is a change in the network topology, the new network layering is calculated for each des- 

45 tination node and stored in portion 2802. CPU 2810 then uses the network layering information and the weight 
infonmation to compute primary and altemate paths, which are stored in portions 2820 and 2830 of mennory 
2850. Persons skilled in the art will recognize that virious implementations for CPU 2810 and memory 2850 
are readily available. 

The benefits afforded by our altemate routing technique can be illustrated using a simple 3 node nKxjel of 
50 Fig. 29, which permits computatk>n of end-to-end blocking with and without altemate routing for various offered 
bads. Based on this analysis, we have determined that altemate routing provides very significant improven>ents 
in end-to-end blocking. 

In Fig. 29, node 2901 has two traffic streams, one destined for node 2902 and ttie other for node 2903. 
The traffic destined for node 2902 has a mean arrival rate of X12 end the traffic destined for node 2903 has a 
55 nrrean arrival rate of X13. Node 2902 has a single traffic stream with mean arrival rate X23 destined for node 2903. 
Let us suppose that, ni2. n^a, n23 are buffers in which cells from the b'affic streams corresponding to >-i3 
and X23 queue up for service. All queues are first-in, first-out (FIFO.) All arrivals are assumed Poisson and ail 
service times are exponential. It is assumed that there is no receive buffer overflow and, hence, we do not nrKtdel 
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the receive buffers. 

Using tfils model, the impact of alternate routing on the A.13 traffic is examined by subjecting a fraction of 
the Xi3 traffic to alternate routing so that, if the occupancy in buffer ni3 exceeds a certain specified threshold, 
called the rejection threshold, the alternate routable fraction of X^a is offered to buffer ni2. ^or transmission 

5 through node 2. Buffer n^2 accepts the altemate routed traffic only if its occupancy is below a specified threshold, 
called the acceptance threshold; if not, it gets rejected and is lost. Once the altemate routed traffic reaches 
node 2902, it is accepted by buffer n23 only if its occupancy is below the acceptance threshold. It is important 
to note that the node 2901 to 2902 and node 2902 to 2903 traffic streams are not subject to altemate routing. 
This is because we wish to study the impact of altemate routing on the end-to-end blocking of the node 2901 

10 to 2903 traffic, as we increase X13 while keeping X12 and X23 constant The queueing model corresponding to 
the network in Fig. 29 is shown in Fig. 30. 

In Fig. 30, X^^^ denotes the direct routed component of X13, and X^^a denotes the altemate routable portion 
of X13. The overall arrival rate for the node 2901 to 2903 traffic is X13 = X^^a + hsa- Using a birth-death process 
nnodel, we have derived exact expressions for the end-to-end blocking suffered by the three traffic classes. In 

15 our analysis, we assumed a buffer size of 100 for ni2, ni3 and n23. since it yields a cell blocking probability of 
roughly 10-« at an offered load of 0.9. We chose the rejection threshold to be 70 and the acceptance threshold 
to be 50. This means that whenever the occupancy of buffer n^a exceeded 70, the cells from the k^^a stream 
are altemate routed to buffer n^2' Buffer ni2 accepts the altemate routed Xi3a cells only if its occupancy is below 
50. Similarly, the altemate routed cells are accepted by buffer n23 for transmission to node 2903 only if the 

20 occupancy at buffer n23 is below 50. All cells that are not accepted are lost. In this simple model, we have not 
accounted for message retransmission. We kept the offered load due to X12 and X13 constant at 0.8 and vied 
the offered load due to X13 from 0.5 to 2.0. 25% of the 1-to-3 traffic was subject to altemate routing. The end- 
to-end blocking suffered by the node 2901 to 2903 traffic at these various loads, with and without altemate rout- 
ing, is shown in Fig. 31. Curve 3101 gives the t»locking probability without altemate routing and curve 3102 

25 gives the blocking probabDity with alternate routing. From Fig. 31 . it Is dear that there is substantial improvement 
in end-to-end blocking, with altemate routing. Fig. 31 does not exhibit the sharp increase in blocking that nor- 
mally occurs with other alternate routing techniques that do not mark packets to avoid the spread of congestion, 
as advantageously provided In our invention. Fig. 32 is a rescaled version of Fig. 31 showing the end-to-end 
blocking experienced by the node 2901 to 2903 traffrc when the offered load ranges from 0.8 to 1.2. Again, 

30 curve 3201 represents blocking probability without alternate routing and curve 3202 represents blocking prob- 
ability with altemate routing. Fig. 32 cleariy shows the dramatic improvement in end-to-end blocking for the node 

2901 to 2903 traffic over a range of offered load of practical interest Because direct routed traffic is given priority 
(altemate routed traffic is accepted only if the buffer occupancy is below 50). the node 2901 to 2902 and node 

2902 to 2903 traffic suffer no significant perfomnance degradation, even when the offered load due to the node 
35 2901 to 2903 traffic is 2.0. The end-to-end blocking for the node 2901 to 2902 and node 2902 to 2903 traffic 

remains virtually at zero. 

In summary , the congestion control scheme in accordance with our inventktn has the following properties: 

(a) guarantees loop-freedom; 

(b) reacts to measurements and changes paths dynamically; 
40 (c) needs local measurements only; 

(d) does not spread congestk>n; and 

(e) carries traffic on lightly loaded links. 

Indeed, the invention allows a connectionless network to efficiently carry as much traffic as possible, since 
packet loss that ordinarily results from buffer overflow is reduced, and the retransmission problem is alleviated. 
45 No additional signaling n^ssages need to be exchanged t>etween network nodes. 

Various nK>d[fications and adaptations of the present invention will be readily apparent to those of ordinary 
skill in the art. Accordingly, it is intended that the invention be limited only by the appended claims. 



50 Claims 

1, A nrethod of routing information packets from a first node in a network of interconnected nodes to a des- 
tination node, comprising the steps of 

a) fonming a first routing table containing the primary route to be taken by infonmation packets at said 
55 first node destined for said destination node and a second routing table containing an altemate route 

to be taken by infonmation packets when said primary route is congested; 

b) HK) niton ng congestion in said network; and 

c) routing a portion of said information packets over saW altemate route in the presence of congestion; 
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wherein said second routing table Is formed by 

d) detenmining other nodes in said network that are interconnected with said first node; 

e) organizing each of said interconnected nodes including said first node into a series of layers in accord- 
ance with their distance, in hops, to said destination node; 

5 f) assigning a weight to each possible path between said first node and each of said other interconnected 

nodes in the same layer, 

g) assigning a weight to each possible path between each of said other interconnected nodes in said 
same layer and a connected node in a different layer, said different layer being closer to said destination 
node; and 

10 h) selecting said alternate route by minimizing the pairwise sum of the weights obtained during said first 

and second assigning steps (f) and (g) above. 

2. The method of claim 1 wherein said weight assigning steps include computing the distance between nodes 
using coordinate information representing the location of said nodes. 

15 

3, A method of controlling congestion in the flow of information bearing packets traveling over paths In a net- 
work of interconnected nodes, comprising the steps of 

routing packets from each node to destination nodes via multihop primary routing paths; 
monitoring congestion In said nodes in said network; and 
20 routing packets from ones of said nodes to said destinations via altemate multihop routing paths 

in the event that congestk)n is encountered in said network; 
wherein sakj altemate routing paths are determined by 

grouping said interconnected nodes into a plurality of layers, each layer containing nodes that are 
the same distance, in hops from a particular destinatk)n; 
25 assigning a weighting factor to each path between Interconnected nodes In said layers; 

assigning a weighting factor to each path between interconnected nodes In adjacent layers; and 
selecting said altemate routing patiis as a function of combinations of said weighting factors. 

4- The invention defined In daim 3 wherein said primary path contains k hops and said alternate path contains 
30 at least k+1 hops. 

5. The invention defined in daim 3 wherein said selecting step indudes 

forming the pairwise sum of weighting factors assigned during both of said assigning steps. 

35 6. The invention defined in daim 3, wherein said assigning step indudes: 

forming said weighting factor as a function of the distance t>etween nodes connected via said paths. 

7. The invention defined in daim 3, wherein sakl alternate route is used only for a portion of the packets inten- 
ded for a congested primary routing path. 

40 

8. The invention defined in daim 7, wherein said method further Indudes the steps of 

marking any packet transmitted over an altemate routing path; 

examining each packet at each node before it is routed, to determine if it has been marked; and 
routing marked packets only if said node is uncongested. 

45 

9. A method of selecting loop free altemate multi-hop paths for information bearing packets traveling over a 
network of communication nodes, comprising ttie steps of 

storing in each of said communication nodes infonmation describing the connections between each 
node in said network and neighboring nodes; 
50 storing in each of said communication nodes informatk>n for assigning weights assigned to paths 

between each connected pair of nodes; 

grouping interconnected nodes into k layers, each layer containing nodes having the same dist- 
ance, in hops, from a potential destinatbn; 

computing, for each node in layer k, tiie poise sum of the stored weights assigned to a) paths be- 
55 tween said node and a first set of connected nodes in layer k; and b) paths between said first set of con- 

nected nodes in layer k and a second set of connected nodes rn layer k-1 , and 

. selecting as the altemate route from said node in layer k to said potential destination, the path hav- 
ing the smallest of said pairwise sums. 
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10. The invention defined in claim 9, wherein said first storing step includes forming an adaptive nninimum 
spanning tree representation of said network. 

11. The invention defined in claim 9. wherein said second storing step includes storing coordinate information 
5 representing the horizontal and vertical location of each of said nodes with respect to a reference system. 

12. A method of reducing congestion in a connectionless network including a plurality of interconnected nodes, 
comprising the steps of 

associating with each node in said network, a primary route to be taken by at least a portion of traffic 
10 from said node destined for each destination node; 

associating with each node in said network, an altemate route to be taken by trafTic from said node 
destined for each destination node in the event that said prinrtary route is congested; 
monitoring congestion in said network, and 

routing traffic on said alternate route in the event that congestion is detected; 
15 wherein said first association step includes forming a k-hop route using adaptive minimum spanning 

tree routing; and 

wherein said second association step includes fonming a route having at least k+1 hops, based upon 
connectivity information locally available in said each node. 

20 13. In a network of interconnected nodes In which packets are transmitted over a primary route determined 
by selecting the shortest path, In hops, between originating node and the destination node, a method of 
providing an altemate route in the event said primary route Is congested, sakJ method comprising the steps 
of 

grouping nodes between said originating node and said destination node into a plurality of groups, 
25 such that the nodes in the k^ group are equally distant, in hops, from sakj destination node; 

assigning a weight, w(i,j) to each path between nodes i and j in group k and a weight w(jj') to each 
path between node j in group k and node j' in group k-1, 

selecting said altemate path such that w(i,j)+w(j,j') is minimized. 



JO 14. A method of determining an altemate route for traffic In a connectionless network of nodes when the prinv 
ary route between said nodes is congested, comprising the steps of 

for each destination, grouping said nodes as a function of the distance of said node from said des- 
tination; 

assigning a first weighting factor to each path between a node in one of said groups and each con- 
35 nected node in the same group, and a second weighting factor to each path t>etween each of said con- 

nected nodes in the same group and other connected nodes in another of said groups; and 
selecting said altemate route as a function of said first and second routing factors. 

15. Apparatus for controlling congestion in the flow of informatk>n bearing packets traveling over paths in a 
40 network of interconnected nodes, comprising 

a) means for nrK}nitoring congestion in primary and secondary routing paths within said network; and 

b) means for routing packets from each node to destinatbn nodes via multihop prinnary routing paths 
in the absence of congestion and for routing packets from ones of sakj nodes to said destinations via 
alternate multihop routing paths in the event that congestion is encountered in said primary routing 

45 paths; 

wherein said routing means includes 

means for grouping sakJ interconnected nodes into a plurality of layers, each layer containing nodes 
that are the same distance, in hops from a particular destination; 

means for assigning a weighting factor to each path between interconnected nodes in said layers, 
50 and for assigning a weighting factor to each path t>etween interconnected nodes in adjacent layers; and 

means for selecting said altemate routing paths as a function of combinations of said weighting fac- 
tors. 

16. The inventk)n defined in claim 15 wherein said primary path contains k hops and said alternate path con- 
55 tains at least ki-1 hops. 

17. The invention defined in claim 15 wherein said selecting means includes 

means for forming pairwise sums of said weighting factors for 
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a) paths between nodes in the same layer, and 

b) paths between nodes in adjacent layers. 

18. The Invention defined in daim 15, wherein said assigning nneans includes: 

5 means for fonming said weighting factor as a function of the distance between nodes connected 

via said paths. 

19. The Invention defined in daim 15, wherein said routing means is arranged so tiiat said alternate route is 
used only for a portion of the packets intended for a congested primary routing path. 

10 

20. The invention defined in daim 19, wherein said apparatus further includes 

means for marking any packet transmitted over an alternate routing path; 
means for examining each packet at each node before it is routed, to detenmine if it has been nrtar- 
ked; and 

15 means for routing marked packets only if said node is uncongested. 

21. Apparatus for selecting loop free alternate multi-hop paths for information bearing packets traveling over 
a network of communication nodes, comprising 

means for storing in each of said communication nodes (a) information describing the connections 
20 between each node in said networtc and neighboring nodes, and (b) infonmation for assigning weights to 

paths between each connected pair of nodes; 

means for grouping interconnected nodes into k layers, each layer containing nodes having the 
same distance, in hops, from a potential destinatk)n; 

means for computing, for each node in layer k. the pairwise sum of the stored weights assigned to 
25 a) paths between sakJ node and a first set of connected nodes in layer k; and b) paths between said first 

set of connected nodes in layer k and a second set of connected nodes in layer k-1, and 

means for selecting as the altemate route from said node in layer k to said potential destinatran, 
the path having the snnallest of said pairwise sums. 

30 22. Apparatus for reducing congestion in a connectk>nless network including a plurality of interconnected 
nodes, comprising 

means for associating with each node In said network a) a primary route to be taken by at least a 
portion of traffic from said node destined for each destination node, and b) an altemate route to be taken 
by traffic from said node destined for each destinatk)n node in the event that said prin^ry route is con- 
35 gested; 

means for monitoring congestion In said network, and 

means for routing traffic on said altemate route in the event that congestion is detected; 
wherein said associating means indudes (a) means for fonming a k-hop route using adaptive mini- 
mum spanning tree routing, and (b) means for forming a route having at least k+1 hops based upon con- 
40 nectivity infcMmatk)n locally available in said each node. 

23. In a network of interconnected nodes in which packets are transmitted over a primary route determined 
by selecting the shortest path. In hops, between the originating node and the destinatk)n node, apparatus 
for providing an altemate route in the event said primary route is congested, said apparatus comprising 

45 means for grouping nodes between said originating node and said destination node into a plurality 

of groups, such that the nodes in the k^ group are aqually distant, in hops, from said destination node; 

means for assigning a weight. w(ij) to each path between nodes i and j in group k and a weight 
w(j,j') to each path between node j in group k and node j' in group k-1 , and 

means for selecting said altemate path such that w(i,j)-»-w(jJ') is minimized. 

50 

24. Apparatus for determining an altemate route for traffic in a connectionless network of nodes when the prinv 
ary route between said nodes is congested, comprising 

for each destinatk)n, means for grouping said nodes as a functbn of the distance of said node from 
said destination; 

55 means for assigning a first weighting factor to each path between a node in one of said groups and 

each connected node in the same group, and a second weighting factor to each path between each of 
said connected nodes in the same group and other connected nodes in another of said groups; and 
means for selecting said altemate route as a function of sakJ first and second weighting factors. 
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FIG. 21 
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FIG. 24 
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FIG. 25 
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